27. Identify the Most Powerful Features

Identify the Most Powerful Features

Question:

Take your (overfit) decision tree and use the feature_importances_ attribute to get a list of the relative importance of all the features being used. We suggest iterating through this list (it’s long, since this is text data) and only printing out the feature importance if it’s above some threshold (say, 0.2--remember, if all words were equally important, each one would give an importance of far less than 0.01). What’s the importance of the most important feature? What is the number of this feature?

Start Quiz:

INSTRUCTOR NOTE:

Special Note: Depending on when you downloaded the code provided for find_signature.py , you may need to change the code in lines 9-10 to be

words_file = "../text_learning/your_word_data.pkl"
authors_file = "../text_learning/your_email_authors.pkl"

so that the files created from running vectorize_text.py are reflected properly.